Microsoft COCO: Common Objects in Context

نویسندگان

Tsung-Yi Lin

Michael Maire

Serge J. Belongie

James Hays

Pietro Perona

Deva Ramanan

Piotr Dollár

C. Lawrence Zitnick

چکیده

We present a new dataset with the goal of advancing the state-of-the-art in object recognition by placing the question of object recognition in the context of the broader question of scene understanding. This is achieved by gathering images of complex everyday scenes containing common objects in their natural context. Objects are labeled using per-instance segmentations to aid in precise object localization. Our dataset contains photos of 91 objects types that would be easily recognizable by a 4 year old. With a total of 2.5 million labeled instances in 328k images, the creation of our dataset drew upon extensive crowd worker involvement via novel user interfaces for category detection, instance spotting and instance segmentation. We present a detailed statistical analysis of the dataset in comparison to PASCAL, ImageNet, and SUN. Finally, we provide baseline performance analysis for bounding box and segmentation detection results using a Deformable Parts Model.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

ChatPainter: Improving Text to Image Generation using Dialogue

Synthesizing realistic images from text descriptions on a dataset like Microsoft Common Objects in Context (MS COCO), where each image can contain several objects, is a challenging task. Prior work has used text captions to generate images. However, captions might not be informative enough to capture the entire image and insufficient for the model to be able to understand which objects in the i...

متن کامل

Oracle MCG: A first peek into COCO Detection Challenges

Microsoft COCO [2] is a new annotated database in computer vision consisting of more than 200.000 images. There are currently more than one million annotated objects from 80 categories, with fully segmented masks. With respect to Pascal [1], the previous available dataset with semantic segmentation annotations, COCO has four times the number of categories and two orders of magnitude more images...

متن کامل

Fine-tuning deep CNN models on specific MS COCO categories

Fine-tuning of a deep convolutional neural network (CNN) is oen desired. is paper provides an overview of our publicly available py-faster-rcnn- soware library that can be used to ne-tune the VGG CNN M 1024 model on custom subsets of the Microso Common Objects in Context (MS COCO) dataset. For example, we improved the procedure so that the user does not have to look for suitable image le...

متن کامل

RONCHI AND PERONA: DESCRIBING COMMON HUMAN VISUAL ACTIONS IN IMAGES 1 Describing Common Human Visual Actions in Images

Which common human actions and interactions are recognizable in monocular still images? Which involve objects and/or other people? How many is a person performing at a time? We address these questions by exploring the actions and interactions that are detectable in the images of the MS COCO dataset. We make two main contributions. First, a list of 140 common ‘visual actions’, obtained by analyz...

متن کامل